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ABSTRACT 

Static analysis (aka offline analysis) of a model of an IP 
network is useful for understanding, debugging, and verify- 
ing packet flow properties of the network. There have been 
static analysis approaches proposed in the literature for net- 
works based on model checking as well as graph reachability. 
Abstract interpretation is a method that has typically been 
applied to static analysis of programs. We propose a new, 
abstract-interpretation based approach for analysis of net- 
works. We formalize our approach, mention its correctness 
guarantee, and demonstrate its flexibility in addressing mul- 
tiple network-analysis problems that have been previously 
solved via tailor-made approaches. Finally, we investigate 
applications of our analysis for two novel problems - auto- 
matically generating test packets, and inferring a high-level 
policy for the network - which have been addressed in the 
past only in the restricted single-node setting. 

1. INTRODUCTION 

Analysis of the flow of packets across an IP network is an 
important problem. It has varied applications, such as iden- 
tifying anomalies in configuration files in routers [Ts] , testing 
of router implementations [s], checking whether a network 
configuration satisfies a high-level policy of a network ad- 
ministrator by querying properties of the configuration ^ 
, and inferring such a high-level policy automatically from 
the network configuration [12[[6]. However, such an analysis 
is challenging, because packet routing in an IP network is a 
complex activity. Routers intervene between subnets (i.e., 
fully connected collections of hosts) , and perform operations 
on packets such as filtering, routing to adjacent routers or 
subnets, and transformation, e.g., for network address trans- 
lation (NAT). Each operation performed by a router is pred- 
icated (i.e., guarded) by the current content of the header 



of the packet, which, due to transformations, changes as 
the packet fiows through the network. There are additional 
sources of complexity: The set of operations performed by 
a router is not fixed once for all, but gets modified as the 
network topology and load characteristics vary during op- 
eration. Also, the outcome of some of these operations are 
dependent not just on the content of the packet header, but 
also on the state of the connection that the packet belongs 
to. All of this means that it is quite difficult to analyze the 
flow of packets across the network. 

The state-of-practice for analyzing reachability is to send 
test packets in the actual network, using commercially avail- 
able tools. However, testing does not give complete infor- 
mation about all possible packet flow outcomes, because it 
is infeasible to send all possible packets across a network. 
Several static (or offline) analysis approaches, e.g 
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[T] , have been reported in the literature in order to overcome 
this disadvantage; these approaches analyze a speciflcation 
of the network topology and router configurations (i.e., a 
model of the network), and emit information that over- or 
under-approximates all possible packet flows in the network. 

1.1 Contributions 

1) Our primary contribution is an abstract interpreta- 
tion |4j based analysis for determining packet flow prop- 
erties in an IP network. To the best of our knowledge ours 
is the first reported approach for this problem that is based 
on abstract interpretation, which is a technique that has 
been typically applied to analysis of properties of programs. 
Abstract interpretation is a customizable framework, in the 
sense that it needs to be instantiated with a lattice (i.e., 
a domain of values to be used in the analysis), and a set 
of transfer functions operating on this lattice. Therefore, 
the analysis designer has the flexibility to use different lat- 
tices of differing precision for the same problem, and prove 
that each one results in a semantically valid (but poten- 
tially approximate) analysis wrt the most-precise analysis. 
We take advantage of this capability by first spelling out 
a precise instantiation of our analysis, which always termi- 
nates (because of bounded packet sizes), but which may be 
expensive. Subsequently, we illustrate how to trade-off this 
precision for scalability, while ensuring that the flow infor- 
mation we compute is an over-approximation of the precise 
flows. Previous static analysis approaches for network anal- 
ysis are hard-wired, and do not readily admit such trade-offs 



within their overall approach. 

2) We show that abstract interpretation is a flexible frame- 
work, capable of determining varying information about packet 
flows in a network. The first variant of our analysis, dis- 
cussed in Section |4] computes a formula for each intermedi- 
ate router that describes the set of packets that reach that 
router. Determining reachability at intermediate nodes (i.e., 
routers) has many applications, such as querying network 
policy [9j [Tl], and identifying rule anomalies and router 
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mis-conflgurations 15 . The above-mentioned approaches 
employ custom solutions, which miss certain packet flows 
(and hence may be unsound) in the presence of cycles in the 
network. The problems addressed by these approaches can 
be solved as straightforward postpasses after our sound and 
generic reachability analysis. 

The second variant of our analysis, discussed in Section|5] 
computes information at each intermediate router that not 
only represents the set of packets reaching that router, but 
also the anginal forms of these packets as when they left 
their originating subnets (before they were transformed by 
address translation along the way). 

3) We propose a novel application of our analysis. In pre- 
vious work [12[ [6] researchers have formulated the problem of 
inferring a high-level policy of the network, in the restricted 
setting of single-router networks. We first generalize this 
problem to the setting of a network of multiple routers, and 
then show how to solve it using the second variant of our 
analysis. 

2. RELATED WORK 

The previous static analysis techniques for IP networks 
that most closely resemble ours are the ones based on transi- 
tive closure analysis [14|, a nd graph propagation with bounded 
unfolding of cycles [9| |11[ |15| . All of these approaches com- 
pute packet reachability information at all nodes in the net- 
work. 
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is the seminal work in 



The work of Xie et al 
the area of formally specified static analysis of networks 
For each pair of nodes i,j in the network, they compute 
using Warshall's transitive closure analysis a formula that 
represents the set of packets leaving i that eventually reach 
j along all possible paths. Xie et al pioneered the idea of 
uniformly treating filtering and NATing as transformations 
on (representations of) sets of concrete packets. The other 
approaches mentioned above, rather than using transitive 
closure, propagate (representations of) sets of packets ex- 
plicitly along the edges in the network model. Our approach 
is similar to these approaches in this regard. 

The approaches mentioned above do not soundly analyze 
packet flows along cyclic paths (i.e., they may miss certain 
packet flow) . Consider the example in Fig. [l] Part (b) of 
this figure shows the configuration rules in the firewalls Fl 
and F2, in plain English form for the sake of clarity. Ba- 
sically, F2 is a trusted subsidiary firewall that Fl sends all 
packets to for the sake of filtering. Therefore, e.g., a packet 
from Zl addressed to Z2 takes the following (cyclic) path: 
Z1-F1-F2-F1-Z2. This example, although trivial, illustrates 
the subsidiary-firewall idiom commonly employed by net- 
work administrators to avoid overloading key firewalls (Fl, 
in this case). The cycle in this path is not a "useless" cy- 
cle, in the sense that certain end-to-end fiows can happen 
only through this cycle. In general, for any integer k, it is 
possible to construct a cycle going through k routers such 
that certain packets entering the cycle leave it only after 



(a) 

Fl forwards all packets from Zl or Z2 to F2 along the F1-F2 
link on the right side. 

F2 filters out bad packets, SNATs src address field of good 
packets to a trusted address T, and forwards them to Fl 
along F2-F1 link on left side. 

Fl forwards all packets that have src address T (i.e., verified 
packets) to Zl or Z2, based on their destination address. 

(b) 



Figure 1: Packet propagation through cycles, (a) 
Example network (b) Routing configuration 



going through the cycle k times. Therefore, unrolling all 
loops a fixed number of times (which is the idea behind the 
approached mentioned above) is not sufficient. Abstract in- 
terpretation involves an iterative analysis until a fix-point is 
reached, and hence cleanly addresses this situation. 

Model checking is another technique has been widely used 
in the literature [10| [Tj [T] for static analysis of networks. 
While the former two approaches model the flow of a single 
packet through the network, Al Shaer's approach [l] mod- 
els transitions of the set of all packets in a network. Since 
packet sizes in IP networks are bounded, model-checking in 
this domain is capable of precise analysis even in the pres- 
ence of cycles. Additionally, model checking can directly 
answer general temporal properties, in additional to reacha- 
bility (abstract interpretation can answer restricted forms of 
temporal properties, too, based on the abstraction chosen). 
Model checking, like abstract interpretation, can also use ab- 
stract domains to compute approximate solutions, e.g., as in 
the Slam ;3|i approach (although the existing model-checking 
based approaches for packet flow analysis do not do this). 
The unique aspect of abstract interpretation is the formal- 
ism that explicitly maps the abstract values in the abstract 
lattice used to concrete values in the concrete domain (e.g., 
sets of packets), and uses this mapping as well as the prop- 
erties of the given abstract transfer functions to prove that 
the analysis is sound (i.e., computes an over-approximation 
of the precise information). 

3. MODEL AND TERMINOLOGY 

A concrete packet is an IP packet in a network. We only 
model the headers of packets; let pkSz be the total number 
of bits in a packet header, partitioned into nFlds flelds. We 
denote the flelds of a packet p as p./i,p./2, . . . ,p./p. These 
flelds include the source address and port, and destination 
address and port. Let Pk represent the domain of all con- 
crete packets. 

We now describe our model of a network. A network con- 
sists of a set of nodes N, which are partitioned into two 
categories: a set of zones (i.e. subnets) Z, which are termi- 
nal nodes, and a set of firewalls (i.e., routers) F, which are 
intermediate nodes. We use zones to model organizational 
subnets as single units; i.e., we assume that each zone z has 
a set of publicly visible IP addresses addr^ (with the sets 



of distinct zones being non-overlapping), and that a packet 
leaving or entering a zone contains only public IP addresses 
of that zone or other zones in its header. We use n, rii, etc., 
to represent individual nodes, z,Zi, etc., to denote individ- 
ual zones, and /, ft, etc., to denote individual firewalls. Each 
zone has a single interface connected to the outside world, 
while each firewall has a set of one or more interfaces. E 
is an irrefiexive, symmetric, binary relation on the set of 
all interfaces in the network, representing the physical links 
between the interfaces; for any link (21,12) € E, we assume 
that ii and 22 do not belong to the same firewall. We use 
nodeii) to denote the zone or firewall to which interface i be- 
longs. When we say (m,i\) — >■ (71,22), we mean (41,12) G E, 
node{i\) = m, and node{i2) — n. 

We now describe our model of how each firewall is config- 
ured; this is based on the widely used package Iptables ^2^. 
Each firewall / has four tables: a DNATing table f.dnat, a 
filtering table f.filt, an SNATing table f.snat, and a routing 
table f.rt. Each packet entering a firewall through any of 
its interfaces goes through the first three tables above, in 
the order mentioned, and finally leaves through an interface 
as decided by the routing table. We assume that firewalls 
are pure routers; i.e., they don't create or ultimately accept 
packets. A filtering table is a sequence of filtering rules, 
while each of the two NATing tables is a sequence of NAT- 
ing rules. Each rule r (filtering or NATing) has two compo- 
nents: its "guard" r.grd, which is a propositional formula on 
the bits in a packet header, and "action" r.act. A concrete 
packet c is said to match a rule r if c satisfies the formula 
p.grd. A packet entering a table is matched against each rule 
in the table sequentially until a matching rule is found; the 
matching rule's action is then taken on the packet, and the 
remaining rules are ignored. For a filtering rule r its action 
is either DROP or ACCEPT; if a packet matches a filtering 
rule r, it is thrown away if r.act is DROP, and is sent out as 
output from the table if r.act is ACCEPT. The final rule in 
any filtering table has the guard true (i.e., is a default rule). 
For any NATing rule /, r. NAT afield is a number which rep- 
resents the field in the packet header that is being NATed, 
while r.act is a formula representing a range of values. If 
the NATing rule matches a packet c then c.r.NAT^field is 
overwritten with one of the values in r.act, and the hence 
transformed packet is sent out as output from the table. If 
no rule in a NATing table matched a packet it is sent out 
untransformed. DNATing rules write into the destination 
address or port field, while SNATing rules write into the 
source address or port field. The routing table f.rt of fire- 
wall / is a function from the interfaces in / to formulas, 
each of which is a constraint on destination addresses; i.e., 
if a packet c, after having gone through the DNAT, filtering, 
and SNAT tables in a firewall, has destination address d, it 
is then sent out of one of the interfaces i of / such that d 
satisfies the formula f.rt{i). 

Note in the discussion above that choices may have to be 
taken by NATing rules as well as during the final routing 
step. We do not model how these choices are made during 
network operation, and instead, in our analysis, assume that 
all choices are possible. Also, we assume the following on 
the fiow of concrete packets in the network: (a) There is 
no IP spoofing; i.e., every packet leaving a zone z has a 
source address that matches addr^, and a source port that 
is within the valid port-range of z. (b) Every packet that 
enters the network from a zone eventually reaches a zone 



1. p.curr: Formula representing the set of concrete packets rep- 
resented by p. 

2. p.orig: Formula representing the set of original packets leav- 
ing a zone that, after flowing through the network, become the 
packets represented by curr. 

3. p.ifNated: A vector of bits, one per field in a packet header. 
p.ifNated.bi is 1 means p. curr.fi contains a value written by 
NATing (by some firewall). 

Note: The fields orig and ifNated are used only by the second 
variant of our algorithm, discussed in Section Isl 

(a) 

1: Inputs: (1) A network configuration, (2) an originating zone 
zo, (3) an abstract lattice, whose elements are abstract val- 
ues, (4) an "initial" abstract value ZQ.from at zone zq, and 
(5) transfer functions for links. 

2: Outputs: For each node n, an abstract value n.abs (repre- 
senting the set of concrete packets that could reach n). 

3: 

4: Initialize zg.abs to ZQ.from. Mark z. 

5: For all nodes n other than 2 initialize n.abs to ± (the bottom 

element of the abstract lattice). 
6: while there exist marked nodes do 
7: Choose a marked node m, and unmark it. 
8: for all link (m,ii) — > (ra, 12) do 
9: Replace n.afes with n.ais U jffjj^ J2j(m.a6s). 

10: If node n was unmarked, or if new value of n.abs dif- 

ferent from old value, then mark n. 

11: end for 

12: end while 

(b) 



Figure 2: (a) Fields in an abstract packet p £ AbsPk 
(b) Propagation of abstract packets. 



z that it is supposed to reach (i.e., its destination address 
when it reaches z matches addr^), or gets dropped by a 
filtering rule before it reaches any zone. 

4. THE BASE ALGORITHM 

Instantiating an abstract interpretation requires us to spec- 
ify (a) an abstract lattice, whose elements are called abstract 
values, which is closed wrt the join operation (i.e., least up- 
per bound, or U) (b) a directed graph on which the analysis 
is to be performed, (c) transfer functions for the edges in 
the graph, which specify the abstract propagation seman- 
tics of the edges (as functions from abstract values to ab- 
stract values), and (d) the initial abstract value at some 
designated originating node zq of the graph. In our set- 
ting the nodes in the network are the graph nodes, and each 
link (m, ii) — >■ (n,Z2) in the network results in a graph edge 
m — >■ n. We show the abstract interpretation algorithm in 
Fig. [2|b); this is basically Kildall's algorithm fs], instanti- 
ated to our setting. The idea behind the algorithm is to 
keep track of an abstract value m.abs at each node m. In 
our setting, each abstract value is a set of abstract packets 
from the domain AbsPk, where each abstract packet in turn 
intuitively represents a set of concrete packets. Whenever 
the abstract value m.abs at a node m changes it is prop- 
agated through each outgoing link (m, ii) — > (71,12) out of 
m (see linesjS ■ 11 1 using the transfer function jf(i^ of the 
link to the successor node n of the link; at n this incoming 
value is joined with the current abstract value at n. The 
algorithm terminates when the abstract values at all nodes 
stabilize (i.e., reach a fix-point); these values represent the 
result of the algorithm. 



Both variants of our algorithm share the basic structure 
mentioned above. However, they differ in the content of the 
abstract packets, and in the join operation and the transfer 
functions. We discuss the initial variant of the algorithm in 
this section, and the second variant in the next section. For 
the initial variant each abstract value is a singleton set, i.e., 
a single abstract packet. Each abstract packet p, in turn, is 
a structure with a single field curr, which is a prepositional 
formula on the bits bo,bi, . . . ,bpksz in a packet header; see 
Fig. [2|a), ignoring the fields orig and if Mated for now (they 
are used by the second variant of our algorithm). An ab- 
stract packet p represents exactly the set of concrete pack- 
ets that satisfy the formula p. curr. For instance, assuming 
packet headers have only three bits, the formula 62 A ^60 
represents the set of packet headers {100, 110}. This is for- 
malized using the mapping 7 from abstract packets to sets 
of concrete packets, and its inverse mapping a: 

l{p) = {c I c is a concrete packet, and c satisfies 
p. curr} 

a{s) = p, such that 7(p) — s 
Since the abstract packet n.abs at a node n is meant to 
represent the set of concrete packets that reach node n along 
all possible paths, it is natural for the join operator to be log- 
ical OR; i.e., piUp2 ~ pa, where ps. curr — pi. curr \Jp\. curr. 
The "initial" abstract value zo.from at the originating zone 
zo is an abstract packet such that its formula curr is satisfied 
by all concrete packets whose source address is in addr^Q. 
The transfer functions are shown in the appendix; ignore the 
statements labeled "Variant 2" or "Inferr ing p olicy" for now. 
Routine filter _tableTF{t, In) in Section [A.2| is the pseudo- 
code for the transfer function for a filtering table t\ {In is 
the set of abstract packets coming into t, while the return 
value is the set of abstract packets that come out of t. Sim- 
ilarly, nat_tableTF {t, J) in Appendix A. 4 is the pseudo-code 
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for the transfer function for a NATing table t. For each 
filtering or NATing table t its transfer function ff^ has the 
signature AbsPk — >■ AbsPk, and captures the sequential ef- 
fect of all the rules in the table. For any abstract packet p, 
the abstract packet ff t{p) represents precisely the set of con- 
crete packets that would result when the concrete packets 
represented by p fiow through the table. 

The routine filter _ruleTF{r,p) in Appendix A.l is the 
pseudo-code for the transfer function for an individual filter- 
ing rule r, applied to an incoming abstract packet p. Sim- 
ilarly, the routine nat_ruleTF(r,p) in Appendix A. 3 is for 
the transfer function for an individual NATing rule r. In 
both these transfer functions an incoming abstract packet 
could get split into two outgoing abstract packets, one that 
matches the rule (and gets the formula p. curr A r.grd), and 
one that does not match the rule (and gets the formula 
p. curr A -^r.grd). In addition, each NATing rule r updates 
the field indicated by r. NAT afield of the incoming packet p, 
by writing into this field the values in the range r.act. This 
is accomplished by subroutine natPacket(p,r). The trans- 
fer function ff t^^^ j^) '^^ ^ ^^"^^ (m, ii) — > (n,i2) is shown in 
Appendix |A.5| and is the only transfer function to be in- 
voked directly by our propagation algorithm in Fig. [2] It 
works as follows: When given an abstract packet p, it routes 
the packet through the tables m.dnat, m.filt, and m.snat, in 
that order. Finally, it refines the packet to exclude concrete 
packets that it represents that have destination addresses 
that do not satisfy the formula m.rt{i\). 

For an illustration consider the example network in Fig.jsja) 
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Action 


Fl 


iltering table: 


1. 


s=10. 192. 29. [1-255], 


DROP 




!i=209. 85. 153.85 




2. 


s=10. 192. 28. [1-255], 


DROP 




rf=209. 85. 153.85 




Fl SNAT table: 


3. 


s=10. 192.29. [1-255] 


SNAT 202.67.34. [6-10] 


4. 


i5=10. 192.28. [1-255] 


SNAT 202.67.34. [1-5] 


F2 filtering table: 


5. 


s=202.67.34.[6-10] 


DROP 



Zl. 
Z2. 



(b) 

lbs = < [ 10.192.29.1-255 : true], [10.192.29.1-255 : true] > 
•.bs = < [202.67.34.6-10 : 10.192.28.1-255], 

[10.192.29.1-255 : 10.192.28.1-255] > 
Z4.abs = < [202.67.34.6-10 : ^{10.192.28.1-255, 10.192.29.1-255, 
209.85.153.85, 202.65.23.2}], 

[10.192.29.1-255 : ^{10.192.28.1-255, 10.192.29.1-255, 
209.85.153.85, 202.65.23.2}] > 

(c) 

(c) 



Figure 3: (a) Example network (b) Firewalls con- 
figuration (c) Reached abstract packets, with Zl as 
origin 



the firewall configurations of which are shown in part (b). 
Zones Zl, Z2, and Z3 belong to the same organization, while 
Z4 models the outside internet. Firewall Fl is the primary 
gateway of the organization. Zones Zl and Z2 use private 
IP addresses; Zl uses the private address range 10.192.29.(1- 
255], while Z2 uses the private address range 10.192.28.(1- 
255]. Fl drops packets from these two zones to the blocked 
outside host 209.85.153.85 - see Rules 1 and 2 (in the guards 
s stands for source address and d stands for destination ad- 
dress). Zone Z3 provides a service to the rest of the organi- 
zation, and is accessible at the (public) address 202.65.23.2 
(i.e., it is outside the organization's intranet). Fl's SNAT 
table translates the source addresses of packets coming from 
zone Zl to the (public) range 202.67.34.(6-10] (see Rule 3), 
and the source adresses of packets coming from zone Z2 to 
the (public) range 202.67.34.(1-5] (see Rule 4). Finally F2 
denies access to Z3 for packets whose source address is in the 
range 202.67.34.(6-10] (see Rule 5). This range corresponds 
to packets that came originally from Zl and were NATed by 
Fl. 

Consider a run of our algorithm starting from zone Zl. No 
abstract packet reaches zone Z3, because F2 denies access 
from Zl. The abstract packet reaching each other zone is 
shown in Fig. ^c). Our notation is as follows: The text 
inside each pair of angled brackets is an abstract packet. 
There are two components inside each abstract packet p, 
delimited by square brackets. The first component is p. curr; 
ignore the second component for now. For convenience, we 
denote the formula p. curr as a pair of constraints on the 



source and destination fields, respectively, separated by a 
colon. 

4.1 Correctness and complexity 

The abstract interpretation framework guarantees termi- 
nation and correctness as long as the instantiation (i.e., the 
lattice, and the transfer functions) satisfy certain sufficient 
conditions (we refer you to Cousot and Cousot's paper |4] 
for the details of the sufficient conditions). Since the for- 
mula n.abs.curr at any node n keeps monotonically getting 
weaker (due to joins) , and since the number of distinct for- 
mulas is finite (due to the fixed packet width), the algorithm 
is guaranteed to reach a fix point and terminate. 

Our transfer functions are precise, in the sense described 
earlier. Also, our abstract lattice is precise, in the sense 
that for any set s of concrete packets, y{a{s)) — s (the 
abstract lattice is called imprecise if for any set s we have 
y{a{s)) D s). Therefore, our analysis is precise; i.e., the final 
abstract packet n.abs at each node n represents precisely the 
set of concrete packets that will eventually flow through n 
(after passing through all its three tables) assuming an initial 
configuration wherein all concrete packets represented by the 
abstract packet zo-from start out from zone zq. 

Reachability analysis in networks is an NP-complete prob- 
lem [7] (on the packet size). In the worst case there could 
be an exponential number of paths to a node n in a network 
and the abstract packet n.abs.curr at this node could in the 
worst-case be updated 0(2''*''^) times during a run of the 
algorithm. Our precise abstract-interpretation formulation 
described so far, therefore, will have similar running time 
requirement as model checking approaches reported in the 
literature 10 7|, which also answer reachability. 



4.2 Precision-efficiency trade-offs 

A key benefit of abstract interpretation is that it uses a 
join operation to merge abstract values reaching any node; 
therefore, it is possible to tweak the abstract packet struc- 
ture, as well as the join operation to improve efficiency (by 
reducing precision). We illustrate this idea by consider- 
ing one such optimization. Rather than have a single for- 
mula describing all the p.curr bits in the packet header, 
we model an abstract packet p as a sequence of formulas 
curri, curr2, . . . , currnFids, where nFlds is the number of 
fields in a packet header. This is typically called an indepen- 
dent attribute analysis in the program analysis literature, as 
opposed to a relational (i.e., precise) analysis. AND and 
OR operations are now done separately on each pair of for- 
mulas (of corresponding fields), while NOT of any sequence 
of formulas is approximated as true (otherwise, the negation 
of an abstract packet could result in exponential number of 
abstract packets). Therefore, the worst-case number of up- 
dates to the abstract packet at any node during a run of the 
algorithm is now O {nFlds *2^'^^^), where fldSz is the number 
of bits in the largest field. This is exponential on the size of 
the longest individual field, as opposed to being an exponen- 
tial on the total size of the packet, which is a significant gain 
in practice. While this analysis may over-approximate the 
packet flows in the network, it still has value; e.g., if it says 
that a certain (undesirable) packet flow is not possible, this 
is guaranteed to be the case. Also, one could start with an 
imprecise analysis, and then progressively improve its pre- 
cision using the idea of counter-example guided abstraction 
reflnement, e.g., as in Slam [s], until the undesirable packet 



Input: Two sets of abstract packets Pi and P2 
Result = 

for all abstract packets pi £ Pi do 

Let newCurr = pi.curr V V{p2SP2|p2-oo9=pi.o"9} Pa-curr 
Result = Result U p, where p is a new abstract packet such 
that p.curr = newCurr and p.orig = pi.orig 

end for 

return Result 

Figure 4: Optimized join operation 



flow to be verifled is proved with certainty to be either pos- 
sible or impossible. 

5. EXTENDED ALGORITHM 

In the flrst variant of our algorithm, discussed in the pre- 
vious section, the fleld p. curr of any abstract packet p repre- 
sents the set of packets that have reached the node where p 
resides. Note that due to NATing, the current form of these 
packets (as represented by p.curr) could be different from 
their original form when they originally left the designated 
source zone zq. In this variant of the algorithm we extend 
the abstract packet to have another fleld p.orig, which repre- 
sents the original forms of the packets represented by p. curr 
when they left zq. This information, which basically aug- 
ments the reachability information, is likely to be useful in 
a variety of bug detection, understanding, and verification 
tasks. We explore a specific application of this analysis later 
in this section. 

In this variant an abstract value (i.e., abstract lattice el- 
ement) is a possibly non-singleton set of abstract packets. 
The "initial" abstract value ZQ.from leaving the zone zq is a 
singleton set containing an abstract packet p whose curr and 
orig formulas are identical, and are satisfied by all concrete 
packets whose source address is in addr^o- 

As before, we formalize the semantics of each abstract 
packet by defining a, 7 maps that relate abstract packets to 
concrete packets. To enable this we first extend our model 
of the concrete packets. We let each concrete packet c have 
two fields c.curr and c.orig, the first one of which repre- 
sents its current contents, and keeps changing as the packet 
fiows through NATing rules, while the second one is fixed, 
retaining its original form throughout. Now: 

7(P) = Upgp 7(^)1 where P is a set of abstr. packets 
lip) ~ {c I c is a cone, packet, c. curr satisfies p. cnrr, 
c.orig satisfies p.orig}, 
where p is an abstract packet 
a{C) = P, such that 7(P) = C, where C is a set of 
concrete packets. 

In other words, the correctness guarantee of the algorithm 
is that if an abstract packet p is in the set n.abs at some 
node n, then for every concrete packet ci that satisfies the 
formula p. ori(; and for every concrete packet C2 that satisfies 
the formula p.curr there is a path in the network from zo to 
n such that ci is in zo.from and ci becomes transformed to 
C2 by the time it reaches n along the path. 

In this setting a precise way to define join of two sets 
of abstract packets Pi and P2 is set union. However, we 
present an optimized version of this join in Fig. |4] which is 
still precise, and is sufficient to guarantee the correctness 
property mentioned above. 

The transfer functions for this new lattice are the same 
ones discussed earlier (shown in the appendix), except that 



the lines labeled "Variant 2" are now included; ignore the 
lines labeled "Inferring policy" for now. The changes to 
transfer innctions filter _ruleTF{_,_) and nat_ruleTF{_, _) can 
be summarized as follows: as each packet flows through a 
rule, the "orig" version is refined using the guard of the rule, 
but only updating the fields that have not been NATed yet. 
We keep track of which fields in p have been NATed so far 
by any firewall along the path along which p flowed, using 
an auxiliary bit p.ifNated.bi for each field I in the packet 
header. The "orig" formula is not refined for fields that have 
been NATed because for a NATed field the rule refers to the 
new (NATed) value, and not the original value. 

There are additional changes in the transfer function 
nat_ruleTF{r,p). If the field r. NAT afield of p is being 
NATed for the first time in the history of this packet, we first 
extract the content of field I from p.curr (which still repre- 
sents the original value of this field when the packet left its 
source zone) and copy it to the corresponding field in p. orig. 
This is done by calling routine update_original_packet{_,J). 
We then update the field r.NAT_field in p.curr by calling 
the natPacket[_, ) (this is the same as in Variant 1 of the 
algorithm) . 

The transfer function described above is precise, in the 
sense that for any abstract packet p and any NATing table 
t, the abstract packet ff^ip) represents precisely the set of 
concrete packets that would result when the concrete packets 
represented by p fiow through the table. The net result of 
this is that for any abstract packet p at a node n, p. orig 
precisely captures the original forms of the packets leaving 
zq that reach n and that are represented by p.curr. 

The semantics of the copying of field I from p.curr to 
p. orig, mentioned above, can be stated more precisely is as 
follows. We extract the original content of this field from 
p.curr as a formula nii, which represents the set si of orig- 
inal concrete bit sequences that reside in field I of concrete 
packets represented by p before the NATing happens. We 
then update the formula p.ong, such that all concrete pack- 
ets represented by it now have a bit sequence from si in their 
I field, but whose other fields are undisturbed. 

Consider again the example in Fig. [Sj where we run the 
analysis starting from zone Zl. Note that a single abstract 
packet p (delimited by angle brackets) reaches zone Z4 (see 
Part (c) of the figure). The first component inside this ab- 
stract packet denotes p.curr; note that its source address 
is the address range 202.67.34.6-10 that was written by the 
NAT rule in Fl. The second component denotes p. orig; note 
that its source address is the original source address range 
of the packet leaving Zl (i.e., 10.192.29.1-255). 

5.1 Application: Inferring a high-level policy 
of a network 

Real-life networks can be large, with 5-500 intermediate 
routers [m]. Configuring these routers correctly is a com- 
plex and error-prone task. In a study of 37 real firewalls 
Wool [13] found that each one of them was misconfigured, 
and had security vulnerabilities. Therefore, it is important 
for network administrators to have access to tools that infer 
a compact, high-level policy from a network that has already 
been setup, to help them debug and validate the configura- 
tion. Tongaonkar et al [l2] and Horowitz et al [H] have pro- 
posed inferring a policy for a single firewall. In both these 
approaches the initial step is to find the rules that have over- 
lapping guards, and then to present a transformed, or dif- 



ferently organized version of the ruleset. While Tongaonkar 
et al flatten the ruleset, by eliminating all overlap between 
them, Horowitz et al organize the rules hierarchically, with 
rules with weaker guards placed "above" rules with stronger 
guards. These ideas do not extend cleanly to the setting of 
multiple firewalls connected as a network. Due to the large 
number of rules in real networks, and because different sets 
of rules may be correlated along different paths in a network, 
it is not clear that rule correlations can be presented in a 
natural, compact manner in this setting. 

Our hypothesis is that in many cases it would help the 
administrator if for each zone z, they are simply given an 
"accept" formula that characterizes the set of packet headers 
that leave z that eventually reach some other zone, and a 
"reject" formula that characterizes the set of packet headers 
leaving z that get dropped by some rule. The two sets may, 
in general, be overlapping; a non-empty overlap should be 
a matter of concern to the administrator, because packets 
matching both these formulas may reach some zone, or none 
at all, depending on the (non-deterministic) route they take 
through the network. This pair of formulas for zone z is a 
high-level policy, in the sense that it is compact, and conveys 
useful end-to-end information whose representation is not 
tied to the actual way in the which the network configuration 
has been set up. 

The first step in determining this high-level policy is to run 
our analysis treating z as the "originating" zone Zq. Then, 
the "accept" formula for zone z is simply 

\J Zi.abs.orig 

Zi6Z-{z} 

If the set of all filtering rules in the network with DROP 
as the action is represented by D then the "drop" formula 
for z is 



\J r. dropped ^packets 



where r.dropped^packets is the set of packets (in their orig- 
inal form) that match (and are hence dropped by) rule r. 
These sets are anyway computed by our algorithm described 
above during the normal propagation. Therefore, to support 
this application, we simply save these sets during propaga- 
tion (see the line with the comment "In ferrin g policy" in the 
routine filter_ruleTF{_,_) in Appendix A.ll, and use them 
here to construct the "drop" formulas. 

In the example in Fig. [3j the "accept" formula for origin 
zone Zl is 



[10.192.29.1-255 : ^{10.192.29.1-255, 209.85.153.85, 202.65.23.2}] 

which corresponds to Z2.abs.orig V ZA.abs.ong. The "re- 
ject" formula for Zl is 

[10.192.29.1-255 : {202.65.23.2, 209.85.153.85}] 

which corresponds to l.dropped_packets V 

5. dropped_packets, where 1 and 5 are rule numbers in 
Fig-i 

6. CONCLUSIONS, AND FUTURE WORK 

We presented a novel abstract-interpretation based ap- 
proach for packet flow analysis in IP networks. We provided 



two different variants of the approach, for inferring different 
properties, and provided formal claims of precision of the 
analysis. We also illustrated the flexibility of abstract inter- 
pretation in trading precision off for efficiency gains. While 
we have taken the first steps in this direction, there are sev- 
eral more-complex packet-flow analysis settings to which we 
would like to extend abstract interpretation. These include 

(a) accounting for transient changes in network configura- 
tion and topology precisely (transient changes are modeled 
by the transitive-closure-based approach of Xie et al |14] ) , 

(b) addressing connection-oriented routing (i.e., stateful fil- 
ters), (c) and answering (restricted) forms of temporal prop- 
erties of networks. These settings lead to a much larger and 
richer state-space than what we have considered in this work. 
Previous approaches have not addressed all these issues to- 
gether; our belief is that abstraction will be a key ingredient 
in addressing them with reasonable precision and scalability. 
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APPENDIX 

A. TRANSFER FUNCTIONS 

A.l Transfer function for a filtering rule 

Function filter ^ruleTF (rule : a filtering rule,p £ AbsPk) 
Output: [Accepted e 2^'"='''*, Unmatched e 2^'^-^*). 
{Accepted is a set containing zero or one abstract packets, that 
represent concrete packets represented by p that are accepted 
by the rule rule. Unmatched is a set containing zero or one 
abstract packets, that represent concrete packets represented 
by p that do not match rule.grd.} 
Accepted <— 4>, Unmatched <— 
if rule. act == DROP then 

if p.curr A (^rule.grd) / false then 
pi <— p 

pl.curr <— pl.curr A (-> rule.grd) 

pl.orig <— pl.orig A ^reduce(rule.grd, pi) {Variant 2} 
Unmatched i— {pi} 
end if 

if p.curr A (rule.grd) /- false then 
p2 <- p 

p2.curr <— p2.curr A rule.grd 

p2.orig <— p2.orig A reduce(rule.grd,p2) {Variant 2} 
rule.dropped^packets <— rule.dropped^packets U {p2.orig} 
{Inferring policy} 
end if 

else if rule. act == ACCEPT then 
if p.curr A rule.grd /■ false then 
pi <— p 

pl.curr <— pl.curr A rule.grd 

pl.orig <— pl.orig A reduce(rule . grd , pi) {Variant 2} 
Accepted <— {pi} 
end if 

if p.curr A (-^rule.grd) /^ false then 
p2 <- p 

p2.curr <— p2.curr A (—• rule.grd) 

p2.orig <— p2.orig A -^reduce(rule.grd,p2) {Variant 2} 
Unmatched ■<— {p2} 
end if 
end if 

return (Accepted, Unmatched) 

Subroutine reduce(g: a rule's guard, p £ AbsPk) 

Output: A reduced guard g' , which does not refer to NATed 

fields in p. 

We assume g to be a conjunction of atomic predicates, each of 
which refers to some field in an abstract packet. 
Let g' be the conjunction of the atomic predicates in g that do 
not refer to any field i such p.ifNated.bi is 1 (this conjunction 



is true if there are no such atomic predicates.) 
return g' 

A.2 Transfer function for a filtering table 

Function filter _tableTF{t : a a filtering table, In : 2'^'''^*') 
Output: A set of abstract packets that represent the concrete 
packets represented by In that are accepted by some rule in 
the filtering table t. 
pSet <— In, Accepted <— ip 
for all filtering rules r in t, in order do 
pSet' ^ (p 

for all p 6 pSet do 
{Acc, Unmatched) <— 

filter _ruleTF{r, p) 
pSet' <— pSet' U Unmatched 
Accepted <— Accepted U Acc 
end for 
pSet ■(- pSet' 
end for 

return Accepted 

A.3 Transfer function for a NATing rule 

Function nat-ruleTF {rule : a NATing rule,p e AbsPk) 
Output: {Matched 6 2^''^^'=, Unmatched 6 2'*'"'^*). 
{Matched is a set containing zero or one abstract packets, that 
represent concrete packets represented by p that are matched 
by the rule rule, and hence will not be passed to subsequent 
rules in the chain. Unmatched is a set containing zero or one 
abstrax;t packets, that represent concrete packets represented 
by p that do not match rule.grd.} 
if p.curr A rule.grd ^ false then 
pi p 

pl.curr <r- pl.curr A rule.grd 

pl.orig pl.orig A reduce {rule.grd, pi) {Variant 2} 
if pl.ifNated.bruie,MAT_field = then {Variant 2} 

pi <r- update-original-packet{pl, rule) 
end if 

pl.ifNated.brule.NAT_field ^ 1 {Variant 2} 
pi natPacket{pl,rule) 
Matched <- {pi} 
end if 

if p.curr A rule.grd) ^ false then 
p2 p 

pl.curr <— pl.curr A ^rule.grd 

p2.orig <— pl.orig A ^reduce{rule.grd,p2) {Variant 2} 
Ummatched {p2} 
end if 

return {Matched, Unmatched) 

A.4 Transfer function for a NATing table 

Function nat_tableTF{t : a dnat or snat table. In : 2^''^^'') 
Output: A set of abstract packets that represent the concrete 
packets represented by In after they are transformed by the 
NATing rules in t. 
pSet <- In, Out <- (j) 
for all NATing rules r in t, in order do 
pSet' ^ (f) 

for all p e pSet do 

{Matched, Unmatched) <— 

nat_ruleTF{r, p) 
pSet' pSet' U Unmatched 
Out ^ Out U Matched 
end for 
pSet ■!- pSet' 
end for 

return Out U p5et 

A.5 Transfer function for a Link 

Function jff (ij ,13) (-f'*' set of abstract packets) 
Output: A set of abstract packets. 
S nat-tableTF{node{ii).dnat, In) 
S -f- filter. tableTF{node{h).filt, S) 



S <— nat_tableTF{node{ii).snat,S) 

Construct a filtering table t with a single filtering rule r that 
drops all packets that don't satisfy formula node{i\) .rt{i\) . 
S ^ filter_tableTF{t,S) 
return |J S 

A.6 Transfer functions for natPacket and 

update^originaLpacket functions 

Subroutine natPacket{p 6 AbsPk, r : a NATing rule) 
Output: A copy of pa<;ket p in which the field r.NAT_field of 
p.curr has been overwritten with the range of values r.act. 
Subroutine 

update_original_packet{p G AbsPk, r : a NATing rule) 
Output: A copy of abstract pa<;ket p, in which the field 
r. NAT afield of p.orig has been overwritten with the contents 
of field r.NAT _field o{ p.curr. 

We do not provide a formal definition of the above two 
formulas, whicli need to simulate updation of fields by for- 
mula manipulation. Rather, we provide an illustration of 
how they work. Let's consider an example where the packet 
header has two fields with 2 bits to represent each field. 

Let p be the given packet and p.curr be defined by the 
formula: 

((61 A -162) V (61 A 62)) A -163 A 64 

where 6i,62,&3, and 64 are the four bits in the header of 
p.curr. 

Now let the given NATing rule bo r and r.grd be true. Let 
the action of r be to change the value of first field in packet 
header to 00. Then the new value of p.curr, computed by 
natPacket{-, -) is given by the formula: 

(((61 A -.62) V (61 A 62)) A -.63 A 64)A 
((-6'i) A (^fo'a) A (6?, = 63) A (61 = 64)) 

wherein the primed variables are now treated as the free 
variables. The first lino in the formula above represents the 
original value of p.curr, while the second line captures the 
fact that while bits 61 and 62 are both set to bits 63 and 
64 are preserved. 

Say before we do the NATing above the p.ifNated field 
in the packet p is 01, indicating that field 1 has not been 
NATed previously and field 2 has been NATed previously. 
Also, let p. orig be defined by the formula: 

(->C3 A ->C4) 

where ci , C2 , C3 , and C4 are the four bits in the header of 
p.orig. Since we are now updating the first two bits of the 
packet p.orig, and these have not been NATed before, we 
update p. orig to the following formula: 

(((61 A ^62) V (fei A 62)) A ^63 A 64) A 

(^C3 A ^C4) A 

((c'l = 61) A {c'2 = &2) A (c^ = C3) A (cl = C4)) 

wherein the primed variables are to be treated as the free 
variables. The first lino above represents the old value of 
p.curr; the second line represents the old value of p.orig; 
the last line indicates that the now value of bits ci and C2 
are to the same as the values of old values of the bits 61 and 
62, respectively. This captures the fact that first two bits of 
p.curr are being copied to p.orig. 

We can simplify both formulas generated above by first 
eliminating unprirncd variables, and then renaming the 
primed variables to their corresponding unprimed form. 



